-
Notifications
You must be signed in to change notification settings - Fork 544
pipeline: filters: lookup added new filter page #1953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
alexakreizinger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for incorporating those changes @olegmukhin (and thank you for opening a PR in the first place—I meant to say that earlier but forgot 😅)
I'm approving this but I'll let you merge it... not sure if we're waiting on corresponding code changes before the docs are ready to go live :)
New documentation page outlines description, example configuration and CSV handling for the new LookUp filter. Signed-off-by: Oleg Mukhin <oleg.v.mukhin@gmail.com>
Updated inputs based on code changes. Added metrics section. Made key considerations clearer with a separate section. Signed-off-by: Oleg Mukhin <oleg.v.mukhin@gmail.com>
WalkthroughAdds comprehensive documentation for a new Lookup filter: configuration fields (data_source, lookup_key, result_key, ignore_case, skip_header_row), examples (Fluent Bit YAML and config), sample input/output, CSV matching rules, metrics, and operational notes; also updates the docs SUMMARY/TOC. Changes
Sequence Diagram(s)(The changes are documentation-only; no control-flow or runtime behavior changes to diagram.) Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
pipeline/filters/lookup.md (1)
143-152: Fix grammar: hyphenate "key-value" in compound adjective.Line 145 should use a hyphen to join the words in the compound adjective.
Apply this diff to fix the grammar issue:
- The CSV is used to create an in-memory key value lookup table. Column 1 of the CSV is always used as key, while column 2 is assumed to be the value. All other columns in the CSV are ignored. + The CSV is used to create an in-memory key-value lookup table. Column 1 of the CSV is always used as key, while column 2 is assumed to be the value. All other columns in the CSV are ignored.While refactoring, consider improving clarity: the phrase "All other columns in the CSV are ignored" could be stronger. Consider: "Any additional columns are ignored and not loaded."
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pipeline/filters/lookup.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
pipeline/filters/lookup.md
[grammar] ~145-~145: Use a hyphen to join words.
Context: ...e CSV is used to create an in-memory key value lookup table. Column 1 of the CSV ...
(QB_NEW_EN_HYPHEN)
🔇 Additional comments (6)
pipeline/filters/lookup.md (6)
1-3: Documentation intro is clear and accurate. The description succinctly explains the core lookup behavior without overcomplicating the specifics of record accessors and CSV column handling. ✓
14-15: Clarifyskip_header_rowparameter behavior vs. past implementation notes.Past review comments indicated "The filter always skips the first row." However, the
skip_header_rowparameter now defaults tofalse, which would mean the first row is processed as data by default. This contradicts the earlier statement. Please clarify:
- Does the filter always skip the first row regardless of the
skip_header_rowsetting?- Or is
skip_header_rowa new feature that changes default behavior?- If the default should be
trueto match the original behavior, this should be updated.
17-79: Configuration examples are clear and comprehensive. Both YAML and config formats are shown with proper syntax and all relevant parameters. The examples demonstrate practical usage with input, filter, and output stages. ✓
81-112: Example data is comprehensive and well-chosen. The input records cover important edge cases (various data types, case variations, quoted values, whitespace, missing fields), and the CSV file demonstrates duplicate key handling and quote escaping. This effectively illustrates the filter's behavior. ✓
115-129: Expected output is missing the whitespace-only hostname record.The input data includes a record with
{"hostname": " "}(line 95), which should pass through unchanged because it doesn't match any CSV entry. However, this record is missing from the expected output (lines 118–129). Please add it to complete the example.The missing record should appear as:
{"hostname"=>" "}
131-141: Metrics documentation is clear and complete. The three counters (processed, matched, skipped) provide good visibility into filter operation, and the note about the "name" label is helpful for monitoring. ✓
Added link from summary.md (overwritten by conflit). Signed-off-by: Oleg Mukhin <oleg.v.mukhin@gmail.com>
Minor changes in language as recommended by vale. Signed-off-by: Oleg Mukhin <oleg.v.mukhin@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
pipeline/filters/lookup.md (1)
143-152: Key considerations are comprehensive; fix compound adjective hyphenation.The key considerations section effectively documents important behaviors (whitespace trimming, data type handling, CSV loading, etc.) and aligns with the examples provided.
However, there is a minor grammar issue at line 145: "key value lookup table" should use a hyphen between the compound adjectives: "key-value lookup table."
Apply this fix:
- The CSV is used to create an in-memory key value lookup table. Column 1 of the CSV is always used as key, while column 2 is assumed to be the value. All other columns in the CSV are ignored. + The CSV is used to create an in-memory key-value lookup table. Column 1 of the CSV is always used as key, while column 2 is assumed to be the value. All other columns in the CSV are ignored.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pipeline/filters/lookup.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
pipeline/filters/lookup.md
[grammar] ~145-~145: Use a hyphen to join words.
Context: ...e CSV is used to create an in-memory key value lookup table. Column 1 of the CSV ...
(QB_NEW_EN_HYPHEN)
🔇 Additional comments (7)
pipeline/filters/lookup.md (7)
1-4: Clear introduction and description.The header and opening description accurately convey the filter's purpose and behavior. This aligns well with the approved suggestions from prior review.
5-16: Configuration parameters table is clear and complete.Parameter descriptions are detailed and align with prior reviewer suggestions, especially the clarifications around
ignore_caseandskip_header_rowbehavior. The table includes helpful examples for record accessor syntax.
17-79: Example configurations are accurate and well-structured.Both YAML and config format examples are consistent, realistic, and properly demonstrate the filter's configuration with all key parameters.
81-96: Comprehensive input examples covering edge cases.The input records effectively demonstrate various data types (strings, numbers, booleans, objects, arrays) and edge cases (whitespace, quotes) that the filter will encounter.
98-111: CSV example is representative and well-structured.The CSV data effectively demonstrates various scenarios including quoted values, escaped quotes, duplicate keys, and matches for the input examples. The header row and varied data types support the documented behavior.
113-129: Verify completeness of output example and quote escaping.The output examples are helpful, but there appear to be two issues to verify:
Missing output record: The input examples (lines 84-96) contain 12 records, but the output examples show only 10 records. The record with
"hostname": " "(line 95) is missing from the output. If this record passes through unchanged (because a whitespace-only value doesn't match any CSV entry after trimming), it should still appear in the output. Please confirm whether this record should be included.Quote escaping accuracy (line 124): The CSV contains
"quoted ""host"""(line 109) with CSV-style escaped quotes (two consecutive quotes representing a single literal quote). The output shows"quoted "host"". Please verify this represents the correct filter behavior and output format.
131-141: Metrics section is clear and follows conventions.The three metrics appropriately track filter activity (processed, matched, skipped records), follow Prometheus naming conventions, and include helpful descriptions. The
namelabel for instance identification is good for multi-filter scenarios.
Fixed broken record assessor link Minor grammar enhancement Signed-off-by: Oleg Mukhin <oleg.v.mukhin@gmail.com>
New filter documentation page outlines description, example configuration and CSV handling for the new LookUp filter to be implemented as part #fluent/fluent-bit/pull/10620.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.